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Abstract.- We investigate in detail a 2-level algorithm for the compu- 
tation of 2-point functions of fuzzy Wilson loops in lattice gauge theory. 
Its performance and the optimization of its parameters are described in the 
context of 2+1D SU(2) gluodynamics. In realistic calculations of glueball 
masses, it is found that the reduction in CPU time for given error bars 
on the correlator at time-separation ~ 0.2fm, where a mass-plateau sets in, 
varies between 1.5 and 7 for the lightest glueballs in the non-trivial symmetry 
channels; only for the lightest glueball is the 2-level algorithm not helpful. 
For the heavier states, or for larger time-separations, the gain increases as 
expected exponentially in mt. We present further physics applications in 
2+1 and 3+1 dimensions and for different gauge groups that confirm these 
conclusions. 



1 Introduction 



The study of pure gauge theories on the lattice at zero temperature has 
now been for a few years in an era of precision "numerical experiments". 
Highlights of the latter include the determination of the low-lying glueball 
spectrum in 2+1 pQ and 3+1 [2j dimensions, the confining string spectrum 
([3] and [I| ) as well as ratios of stable string tensions in SU(N) gauge theories 
(0, [01)- The reasons for this progress lie both in the increase in comput- 
ing power and in the development of new numerical techniques, such as the 
fuzzing procedures (jjj and jSJ), improved actions (e.g. 2J) and multi-level 
algorithms (MLA) (0, [TQ| and |Hj). The purpose of this paper is to inves- 
tigate the performance of the version proposed in |llj . 

Retrospectively, we consider that the first multi-level algorithm was pro- 
posed in Pj; it bears the name of the "multihit method", and consists in 
replacing the link variables by their average under fixed nearest-neighbour 
links, when computing a Wilson or Polyakov loop. It is a realisation of the 
real-space renormalisation group transformation. Only much more recently 
was it realised ^Hj that - thanks to the locality property - the idea can be 
applied more generally and iteratively by performing nested averages under 
fixed boundary conditions (BC). Multi-level algorithms are of course par- 
ticularly powerful in theories with a mass gap, where distant regions of the 
lattice are almost uncorrelated. An impressive increase in the performance 
with respect to the ordinary 1-level algorithm was achieved in the Polyakov 
loop correlator - the improvement is proportional to the area span by the 
two loops. A further step towards generalization was taken in [H], where 
the algorithm was adapted to any functional of the links - including fuzzy 
operators - that can be factorized. Indeed the factors need not even be 
gauge- invariant. The efficiency of the algorithm is based on the fact that the 
UV fluctuations can be averaged out separately for each factor, effectively 
achieving n n f measurements by only actually computing n of them, where n 
is the number of measurements done at the lower level of the algorithm and 
rif the number of factors. The choice of the factorization is thus dictated by 
a competition between having as many factors as possible and each factor 
being as independent of the BC as possible. 

In glueball and string tension calculations, the smearing |Z] and block- 
ing [Hj techniques, used in conjunction with the variational method [2~H] . 
have become part of the well-established machinery to efficiently determine 
the glueball spectrum. Such fuzzy operators have typically large overlaps 
onto the fundamental state in the studied symmetry channel and are far 
less sensitive to UV fluctuations than bare Wilson loops. While it is clear 
that for asymptotically large Euclidean time separation, the multi-level algo- 
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rithm becomes more efficient than the 1-level algorithm, the question of real 
practical interest is whether one can truly improve the efficiency of realistic 
calculations. Typically, the operators have reached mass plateaux already at 
0.2fm in the case of glueballs. In such a regime, one cannot expect a statis- 
tical error reduction by orders of magnitude. Only a numerical analysis can 
reliably address the question formulated above. 

It is equally important to determine whether the efficiency of the algo- 
rithm is maintained as the lattice spacing is decreased. Indeed the correla- 
tion length becomes larger and larger in lattice units and one might wonder 
whether the low-level measurements at fixed BCs are still helping to reduce 
the dominant fluctuations on the correlator. 

The outline of this paper is as follows. After a description of the algorithm 
(section^, we investigate in section|3]the dependence of the 2-level algorithm 
on its parameters in the context of glueball 2-point function calculations. We 
numerically demonstrate (section 0} that the parameters of the algorithm can 
easily be optimized and compare its efficiency to that of the 1-level algorithm 
as the continuum is approached. Special attention is paid to the operators 
bearing vacuum quantum numbers, for which the vacuum expectation value 
(VEV) must be subtracted carefully. In section we present miscellaneous 
physics applications (glueballs and fc-strings in 2+1 and 3+1 dimensions). 
We end with a summary and an outlook on possible future developments. 
The appendix contains a comment on the Liischer-Weisz algorithm. 

2 Algorithm description 

In this section, we describe the implementation of a 2-level algorithm for the 
measurement of 2-point correlation functions. We use the isotropic Wilson 
action, and the update is done with compounds sweeps consisting of a 1+3 
mixture of heat-bath [21] and over-relaxation [22J sweeps. 

The operators are smeared, blocked, definite- momentum operators in 2+1 
dimensional SU(2) gauge theory. We shall use glueball operators as exam- 
ples, however the conclusions will be shown in subsequent sections to be 
applicable to the measurement of fuzzy spatial flux-tubes as well. 

The details of the algorithm ^T] are the following. The lattice size is 
N x x N y x N t . After a number N up of compound update sweeps, we freeze 
N t /A time-slices separated by distance A, and measure the average values 
(0(ti))b c of the operators in all the other time-slices ti between the fixed 
time-slices by doing n updates under these fixed BCs. These average val- 
ues in each time-slice are kept separately. They are written to disk before 
updating the full lattice again. N up is typically chosen to be n/10, so as 
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to represent a negligible amount of computer time, and nevertheless ensure 
good statistical independence of the "compound measurements" (this will be 
checked in section HJ). 

After N bc of these compound measurements, the correlator for t > 2a can 
then easily be computed 'off-line' from 

(0(u)0(ti)) = ^ £<o(*OM0(*;)>*, (i) 

if the time-slices tj and tj do not belong to the same "time-block". This 
equation holds because the BCs have been generated with the weighting 
given by the full lattice action [TU] . 

A few comments on the data storage are in order. If Nop are being mea- 
sured, the amount of data generated is 

nb (data) ii = N op x N t x N bc . 

This is to be contrasted with the ordinary way of storing the data: the 
correlation matrix is computed during the simulation, and stored in typically 
N bin = 0(100): 

nb(data)i ~ N* p x N t x N bin . 

The ratio is thus 

nb(data)n __ J_ N bc 
nb(data)i ~ N op N bin 

As an example, for a large production run with a total of 10 6 measurements, 
we may do n = 10 3 measurements under N bc = 10 3 fixed BCs. Therefore, for 
N op ^> 10 - which is usually the case -, the data size is smaller than with the 
1-level algorithm. The obvious advantage of version II is that one can use a 
much larger number of operators (e.g. include non-zero momenta, scattering 
states, the square of the traces of operators, . . . ). Further advantages include: 

• one can choose the binning a posteriori, thus making a more detailed 
check of auto-correlations possible; 

• if e.g. A = 8 and one is computing the 2-point function at t — 5, there 
are several ways to obtain it, which of course all have the same average, 
but different variances; it is very convenient to be able to choose which 
combination is optimal a posteriori (see section EJ). 

• in principle, one can extract 3- and 4-point function from the same 
data set, as long as one correlates operators that have been averaged 
in different time-blocks. Derivatives of the 2-point function can be 
computed just as easily. 
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On the other hand a downside of this way of proceding is that one looses 
information on the short-range correlator (0 and 1 lattice spacing of Euclidean 
time separation). The time-zero correlator is useful because it allows one to 
evaluate the overlap of the original operators onto the physical states. In 
some cases it may be desirable to store the BC-averages of the short-range 
correlators since 

(oitiWti)) = ^ E<°(**)°(*i)>ta ( 3 ) 

if the time-slices ti and tj belong to the same time-block. Incidently, for 
ti = tj, we shall see in section 0] that this measurement can be useful to 
predict the performance of the algorithm for the larger time-separations. 

3 The algorithm and its parameters 

We now present data obtained at /3 — 12, V — 32 3 in the 2+1D 577(2) pure 
gauge theory. Note that ^faa = 0.1179(5) [1,, which means that a = 0.055fm 
(if we use \fa = 420MeV) and we are indeed well in the scaling region, as 
far as the low-energy observables are concerned. 

We perform a check of the auto-correlation of compound measurements 
done at fixed BCs. We then proceed to a study of the efficiency of the al- 
gorithm as a function of its various parameters: first, the width A of the 
time-block inside which the submeasurements are made; secondly, the num- 
ber of submeasurements. We will then look at the dependence of the error 
bars on the mass of the state being measured. 

Binning analysis On fig.Q we show the error bar on the correlator and its 
local-effective mass (LEM) for an operator lying in the A 2 irreducible repre- 
sentation (IR, containing spins 0~, 4, 8, 12. . . ) as function of the number of 
jacknife bins. We note that as long as the number of bins is not much smaller 
than 100, the error bars are stable under the change of binnings. Obviously 
the error bars are subject to fluctuations themselves, and in some cases we 
will give estimates of the latter. However we can draw the lesson than the 
number of updates N up ~ n/10 is apparently sufficient to decorrelate the 
BCs sufficiently 

The distance between fixed boundary conditions Fig. |21 show the 
dependence of the error bar on the A 2 correlator as a function of the number 
of submeasurements n, at rougly fixed CPU time. We consider that the 
comparison of error bars is meaningful at the 20% level. The fundamental 
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state in that lattice IR is relatively heavy and known to have rotation 
properties very similar to a continuum spin 4 glueball. The left graph shows 
the A = 4 data, the right one the A = 8 data. 

In the first case, the smallest error bar is achieved for the smallest number 
of measurements (here n = 100) for all time-separations (t = 2, 3, 4). On 
the right-hand side, the situation is the following: for time-separations t = 
2, 3, 4, a small number of submeasurements (n = 200) is more favourable, 
while, interestingly, the error bar for the t = 5 correlator is practically in- 
dependent of n. For t > 6, the hierarchy is inverted: the runs with a large 
number of subsweeps (n = 800, 1600) yield smaller error bars. This is con- 
sistent with the rule of thumb proposed in jllj . namely that the optimal 
number of submeasurements should be of order e mt . In the present case, this 
evaluates to ~ 1300, given that am(A 2 ) — 1.2. 

We can already draw the conclusion that A = 4 is too small a time 
block and a significant number of submeasurements does not lead to further 
variance reduction on the correlator. A = 8 ~ ~ 0.5fm/a on the other 
hand seems well suited for that purpose. This conclusion is expected to hold 
also in 3+ ID pure gauge systems, since their long distance correlations are 
very similar to the present 2+1D case. 

It is also interesting to look at the error bars on the LEM. Indeed, when 
a large number of submeasurements is performed, the 2-pt function at time 
t and t + a can be expected to be numerically more strongly correlated, thus 
leading to reduction of fluctuation of their ratio. This point is illustrated 
in fig. |2j On the left (concerning A = 4), the variance on the LEM at 3.5 
lattice spacings is practically constant. On the right, we observe that even 
at the smaller time separations t = 2, 3, 4, the runs with a large number of 
subsweeps (800) are at least as good as the runs with the smaller number of 
subsweeps (200). From 5.5 onwards, there is a clear advantage at performing 
a large number of subsweeps. For instance, the error bar for the n = 800 run 
is roughly 3 times smaller than that for the n = 200 run, and this at equal 
CPU time. 

Time-separation dependence of the error bars It is also instructive 
to look at the same data from another point of view: for a fixed number of 
submeasurements, how does the error bar on the correlator and on its LEM 
vary as a function of time separation? On fig. El it is clearly seen that the 
error bar decreases exponentially as the operators are measured further away 
from the fixed boundaries. For A = 8, the variance drops by a factor 100 
between t = 2 and t = 7 for the runs with n = 800 and 1600. 
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Mass dependence of the error bars We plot the LEM as well as the 
local decay constant of the error bar on the correlator together on fig. 0] The 
upper figure illustrates the situation with a large number of submeasurements 
(n = 1600), while the lower shows what happens with only n = 200. 

We show two light states, the fundamental A\ and A 3 states as well as 
the fundamental A 2 that was considered up to here. For the A\ and A3, 
the error bar decays along with the signal, since the former's decay constant 
matches the LEM of the corresponding operator. As a consequence, long 
mass plateaux are seen, with error bars increasing only very slowly. For the 
heavier A 2 state, the error bar decay constant keeps up only to 4.5 lattice 
spacings, resulting in a fast loss of the signal beyond that. It is neverthe- 
less much more favourable a situation than with only 200 submeasurements: 
while the lightest glueball plateau is obtained just as well, the A 3 data is 
much more shaky and the A 2 is essentially lost beyond 4.5 lattice spacings. 
We note that although the basis of A3 operators was the same for n = 1600 
as for n = 200, the variational calculation performed slightly less well in the 
latter case. 

In fact, the time separation where the error bar decay constant falls off 
on fig. 0] gives us an idea of the time-separation for which the number of 
submeasurements is optimal. Indeed, if the error bar continues to fall off, it 
means that the measurements have a large degree of statistical dependence 
through the common BC, since moving further away from the fixed BC makes 
them less dependent. However, once - far away from the BC - the error bar 
is constant (i.e. its decay constant is now zero), the signal to noise ratio is 
falling exponentially to zero. Thus n = 1600 is best suited for measuring the 
A2 mass (am ~ 1.2) at 5.5 lattice spacings. 

4 Optimization procedure & performance 

We now proceed to a more systematic study of the efficiency of the 2-level 
algorithm. We shall consider three states, in the Ai, A 2 and A 3 lattice IRs. 
The lightest states in these representations correspond [T3] to the J p = + , 
J = 4 and J = 2 continuum states. The procedure we adopt is to measure 
these three correlators at fixed physical Euclidean time separation t. We do 
so at three values of /3 — 6, 9 and 12 - recall that in the scaling region, the 
lattice spacing simply scales as 1/(3. The correlator is evaluated for different 
numbers of submeasurements under fixed BCs: 

1 < n < 200 (4) 
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We then plot the inverse efficiency £ 1 as a function of the number of sub- 
measurements: 

r\n) = [AC n (t)} 2 x n, (5) 

where AC n (t) is the error bar on the correlator when measured n times under 
fixed BCs. In some cases, we shall also consider the efficiency with respect 
to the LEM, in which case AC n {t) is replaced by Amf ! (£). 

In this study the number of BCs was 100. They are separated by 80 
sweeps. The individual measurements done under fixed BC were stored sep- 
arately, to allow us to combine them in different ways. In particular, to 
obtain the efficiency corresponding to 10 submeasurements, for each BC we 
can split the 200 submeasurements into 20 'independent' sequences of 10 sub- 
measurements. These 20 sequences are then used to estimate the variance on 
the error bars themselves. On fig. El and [7| we show these roughly estimated 
variances for n < 20, after what the number of 'independent' sequences be- 
comes smaller than 10 and these variance estimates become unreliable. The 
aim here is only to give the order of magnitude of the uncertainty on £, so 
as to be able to reach meaningful conclusions concerning its minimum as a 
function of n. 

Eventually of course it is desirable to have an easier way to optimize the 
parameters of the algorithms. When we have an operator with exactly van- 
ishing vacuum expectation value (VEV), we define a quantity u as the zero- 
time-separation correlator, multiplied by the number of submeasurements 
n: 

1 n 

<"(",**) = T r £ E (0{Uf) hc (6) 

bc be meas=l 

Obviously u is a function of the distance between the time-slice where the 
operator is measured and the fixed time-slices. It is easy to evaluate this 
quantity accurately: one of the objectives of this analysis is to check for the 
validity of this quantity as a predictor of the optimal number of submeasure- 
ments of the 2- level algorithm. The absolute value of uj will not interest us, 
rather we will check whether its minimum is reached at the same n as £ _1 (rz). 

It is also interesting to compare the efficiency of the 2-level algorithm to 
the standard 1-level algorithm with an equal number of measurements. In 
this case the translational invariance in the time direction is not broken by 
the algorithm. The sweeps between BCs have no raison d'etre here; on the 
other hand the measurements are done in each time-slice, including those that 
are kept fixed in the 2-level algorithm. Thus the comparison of algorithms is 
fair. 

Let us first consider the lightest A 3 state (fig. EJ). The graphs correspond, 
from top to bottom to ft = 6, 9 and 12. We keep the physical time separation 
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approximatively fixed at about 0.22fm (2, 3, and 4 lattice spacings respec- 
tively), and similarly the separation of the fixed time-slices is augmented in 
lattice units (4a at (3 = 6, 6a at (3 = 9 and 8a at (3 = 12; we also show the 
case A = 4a at (3 = 12 for comparison). The first observation is that the 
2-level algorithm performs better at all three lattice spacings. If the number 
of submeasurements is chosen 'reasonably', the inverse efficiency is smaller 
by a factor ~ 3 at the coarsest lattice spacing, and by a factor ~ 2 at both of 
the smaller lattice spacings, provided A is kept fixed in physical units. Sec- 
ondly, the curve for £ _1 is extremely flat around its minimum. For instance, 
at (3 — 9 it seems that it does not matter whether one does 10 or 40 submea- 
surements, the performance for this particular observable will be unchanged. 
The flatness becomes even more pronounced closer to the continuum. This 
however is not true for the case A = 4a at f3 = 12. Although the curve has a 
narrow minimum at a small number of submeasurements, the efficiency then 
decreases rapidly and this setup becomes less favourable than the standard 
algorithm. Thirdly, we note that the quantity u> shown at f3 = 12 (it has been 
rescaled in such a way that it can be plotted along with the other curves) 
is a very good predictor of the minimum of the inverse efficiency curve 
and this both when A = 4a and A = 8a. Its qualitative aspect (including 
the flatness) is very similar to the £ curve. 

The qualitative statements that have been made for the A3 correlator 
also apply to the A 2 correlator (see fig. |HJ), whose mass is larger by a factor 
~ 4/3. As one might expect, the higher mass favours the use of the 2-level 
algorithm even more: the gain in CPU time for constant error bars is roughly 
a factor 6 at all three values of (3. Again the £ curve is extremely flat, but 
the optimal number of submeasurements has shifted to the right: in fact, 
100 submeasurements seems to be a good choice at all three lattice spacings. 
Choosing a narrow width for the time-blocks has the clear disadvantage of 
leading to a smaller gain in efficiency and that this efficiency varies much 
more rapidly with the number of submeasurements. These facts are again 
well predicted by the curve u. 

The ++ case We now move to the A\ correlator, which gives the mass of 
the lightest glueball. Since this is the trivial representation, the operator has 
a non-zero VEV, which has to be subtracted in one way or the other in order 
to extract information on the glueball spectrum. With the ordinary 1-level 
algorithm, it is customary to subtract the VEV a posteriori: 

c(t) = £ ((o(t')-(o))(o(t + t')-(o))) 
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N t i / N t Y 

= E m f )o(t+t'))-- E W)> (7) 

t'=i iV * Vt'=i / 

This way of proceding is perfectly applicable to the 2-level algorithm, pro- 
vided that only those measurements incorporated in the 2-point function are 
included in the VEV evaluation. In other words, exactly the same measure- 
ments must appear in the second sum as in the first in eqn. 

c{t)=Y. (o(W+^)-ijIt(e (0(0) 

t'eBt TtWt) \t'ee t 

where 0f is a subset of {1, . . . ,N t }. It varies with t: depending the time- 
separation, the measurement of the correlator uses different time-slices. It 
is recommended to store the measurements in double precision, since the 
cancellation between the two sums grows with the time-separation. 

Our experience is that failing to do the subtraction in this way leads to 
a very large variance on the correlator (typically 30 — 50% in a typical run). 
The explanation is that in this way, one is really measuring, on a large but 
finite set of configurations, the fluctuation of the operator around its average 
value measured on these configurations. Naturally, in the infinite statistics 
limit, both schemes give the same answer, but the proposed one benefits from 
the strong correlation between the 2-point and 1-point function when they 
are measured on the same configurations. 

There are of course many alternative possibilities 1 . One of them relies 
on the variational method which is widely used to improve the projec- 
tion onto the fundamental state and to extract information on the excited 
spectrum. It was applied for instance in [T3] and consists in feeding the 
unsubtracted correlation matrices into the variational calculation. The gen- 
eralized eigenvalue problem then yields the massless vacuum, followed by the 
fundamental glueball, the first excited, etc. The determination of the vac- 
uum is very accurate in our experience, and the variance on the masses of the 
physical states did not seem to be higher. Naturally, one of the operators in 
the basis is wasted to project out the vacuum, but this is not an issue when 
one disposes of a large set of operators, as is usually the case. 

Finally, we note that another group has used the 2-level algorithm 
for compact U(l) scalar glueball calculations, where the forward-backward 
symmetric derivative of the correlator was taken. It is clear that at small 
temporal lattice spacing, the finite-difference formula can evaluate the deriva- 
tive accurately, due to the large correlations between time-slices. The idea is 
thus related to that expressed by eqn. |HJ 

1 I thank Urs Wenger for discussions on this point. 
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These different methods are illustrated on fig. [7| the inverse efficiency of 
the 1- and 2-level algorithms are plotted as a function of n. The VEV has 
been subtracted either by use of eqn. IHlor by applying the variational method 
to a set of three operators (the resulting operator had very large overlap onto 
the lightest state in either method, and therefore a comparison is meaning- 
ful). We see that with either algorithm, the two VEV-subtraction methods 
perform equally well. The second observation is that the 2-level algorithm is 
performing poorly here, if n > 10. If we turn to the LEM, we see that both 
the derivative-method and the direct VEV-subtraction method have the same 
efficiency, once the number of measurements for n > O(50). For n < 50, the 
VEV-subtraction looks better; note however that, for discretization reasons, 
the LEM on the derivative is actually at 4 lattice spacings, rather than 3.5. 

5 Physics applications 

We present four datasets: glueballs and fc-string tensions in 2+1 and 3+1 
dimensions. We shall be comparing the efficiency of the ordinary 1-level algo- 
rithm with the 2-level algorithm. A general comment applies to all four cases: 
the operators used are all variationnally [2Bj determined linear combinations 
of smeared p], blocked [B] operators which large overlaps onto the physical 
states. The variational method involves Cholesky-decomposing a correlation 
matrix, which fails if the statistical noise spoils the positivity of the matrix. 
Our experience with the 1-level algorithm is that the procedure often fails for 
that reason if it is applied at t > 1. By contrast, we were always able to per- 
form the decomposition when using the 2-level algorithm. The explanation 
is that the operators decaying more rapidly, which couple to excited states, 
are more accurately measured in such a way that the positivity is preserved. 

5.1 Glueballs in 2+1 dimensions 

As a first physics application of the methods that have been presented, we 
extract the masses of several excited-glueball masses in 2+1D SU(2) gluo- 
dynamics. Table |H1 shows the data at three different lattice spacings. The 
parameters of the 2-level algorithm are n = 5000, 1000, 800 and A = 4, 6, 8 
respectively for (3 = 6, 9, 12. The variational method is applied at 2.5 lat- 
tice spacings for the first two lattice spacings, and 3.5 lattice spacings for 
the third. A detailed efficiency comparison has already been made in sec- 
tion HJ The identification of the continuum numbers was worked out in 
and [12], and will be further detailed elsewhere [T9] . but we now dispose of 
increased statistics and make use of the 2-level algorithm: this allows us to 
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obtain an accurate determination of glueball masses as heavy as three times 
the lightest glueball. In the future this ability will have to be followed by 
the development of techniques to reduce systematic uncertainties on glueball 
spectroscopy measurements (mixing of single-glueball states with scattering 
states, multi-torelon states, . . . ). 

5.2 Glueballs in 3+1 dimensions 

The multi-level algorithm applied to gauge-invariant correlators was first 
tested on glueball operators in the 3+1 dimensional SU(3) gauge theory [TT] . 
Some exploratory steps were carried out to optimize the parameters, while in 
this paper we have performed a much more detailed study in 2+1 dimensions. 
One might wonder whether the conclusions carry over to 3+1 dimensions, 
since the short-distance fluctuations scale differently. 

Here we shall simply present a comparison of efficiency in a realistic case of 
glueball calculations at (3 = 6.0, (3 = 6.2 and (3 = 6.4, where we can compare 
our data to that of the 10-year-old UKQCD data The parameters of 
the 2-level algorithm are n = 40 for all three values of (3, while A = 8 for 
(3 = 6.2 and 6.4, and A = 6 for (3 = 6.0. Let us focus on the lightest states 
in the A± + , E ++ and T x ++ representations (see table |2J). We compare the 
efficiency in terms of the error bars on the LEMs by scaling the 1-level error 
bar to the number of sweeps done in the run where the 2-level algorithm was 
implemented (see eqn. |SJ). The same conclusions hold than in 2+1D: apart 
from the lightest glueball, the efficiency of the 2-level algorithm is greater 
than that of the 1-level one, and increases rapidly with the mass of the 
state. The comparison to the UKQCD data is slightly less robust, because 
the operators used are not the same. The difference in the extent of the 
time direction was compensated by scaling up the statistics of the 2-level 
run. Nevertheless, the same trend is observed as in the comparisons at the 
coarser lattice spacings. 

Consider the correlator at four lattice spacings. It can be obtained by 
correlating the time slices situated symmetrically around the fixed time-slice, 
or asymmetrically. Naturally, the first way is more favourable. However, 
for a very massive state, the measurements are expected to be very weakly 
correlated to the fixed BC; therefore the asymmetric correlator can increase 
the statistics and reduce the final error bar. In fact, one can make any 
mixture of both measurements. If i is the time-coordinate of the fixed BC: 



C(t = 4) oc 



J2 [0(i+ l)0(i- 3) + 0(i- l)0(i+ 3)] 



bc be 



+ TT - 




+ 2)0(t-2) 



(9) 
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The parameter a can be optimized as well. This is illustrated on fig. El where 
the variances of the LEMs are plotted as a function of a for (3 = 6.0. We 
see that the optimal value of a increases with the mass of the state, as one 
might expect. However, the dependence on a is weak for a > 0.2: choosing 
a = 0.3 is practically optimal for all states and a = 1 is not much worse 
(note that the choice of a is done a posteriori) . 

5.3 /c-strings in 2+1 dimensions 

We present an excerpt of a dataset that will be published in full elsewhere • 
We compute masses of k — 1, 2, 3 and 4 fuzzy flux tubes in 2+1D 577(8) 
gauge theory; our methodology follows that of [6. . Table El contains data 
obtained both with the 1- and 2-level algorithms on a relatively coarse lattice 
(y/aa = 0.2550(4)). The algorithm parameters are n = 1000, A = 4. It was 
anisotropic in size in order to check for corrections to the linear dependence 
of the mass on the length of the Polyakov loop. 

The small mass of the k = 1, L ~ 2fm state makes the 1-level far more 
efficient to compute the LEM even at 2.5 lattice spacings. The 2-level algo- 
rithm only becomes superior for L > 2.6fm. Another point is that the overlap 
of our operator onto the fundamental state is better than 99%, which means 
that the mass can be extracted already at 0.5 lattice spacings. The lesson 
we learn from this is that it is worth keeping the correlator at small time- 
separations, as indicated in section |21 

For all other states, the conclusions are quite different: the 2-level algo- 
rithm is 4 times more efficient for the k = 2 L = 2fm string, and this gain in 
efficiency grows extremely rapidly with the mass of the state. 

5.4 /c-strings in 3+1 dimensions 

Finally, we present data obtained some time ago on k — 1 and k = 2 strings 
in 3+1D SU(4:) gauge theory at three different lattice spacings (see table EJ). 
For (3 = 10.90, 11.10, 11.50, the parameters of the 2-level algorithm were 
n = 80, 20, 20 and A = 6, 4, 4 respectively. The masses are extracted 
from cosh fits starting at 2.5 lattice spacings. The data at the first two 
values of (3 can be directly compared to that of [6 , where 10 5 sweeps were 
done in each case with the 1-level algorithm. Two comments are in order: 
the k = 1 string is more accurately obtained with the 1-level algorithm, 
while the 2-level algorithm performs slightly better for the k = 2 string 
at (3 = 11.10, although its parameters may not have been optimal. At 
(3 = 10.90, we notice a discrepancy between the two masses of 2.9 standard 
deviations. This is presumably due to the fact that the cosh fit was started 
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at one lattice spacing JH] for that particular state in the work of Teper 
and Lucini. Generally speaking, the 2-level algorithm has the advantage of 
yielding longer mass plateaux because of the decrease of the error bar with 
the separation to the BC, and therefore helps reducing the systematic error. 
Indeed lattice calculations, because of the positivity property of correlators, 
naturally tend to overestimate the masses extracted from them. 

6 Summary & outlook 

It is time to summarize what we have learnt about the 2-level algorithm. We 
have emphasized the linear dependence of the data size on the number of 
operators; auto-correlations can be checked for easily, and the precise way in 
which the correlator is computed can be optimized a posteriori. 

The optimization study of the parameters led to the conclusion that A ~ 
t=j is a good choice for the separation of the fixed time-slices. In that 
case, the variance of the correlator decreases exponentially ~ e~ mt as long 
as the number of measurements at fixed boundary conditions n > e mt . As a 
consequence, longer mass plateaux are seen, even for the more massive states. 
This feature should help in reducing the systematic bias to overestimate the 
masses being calculated. 

Suppose we want to compute the correlator at time-separation t from 
measurements in time-slices i + t/2 and i — t/2 with respect to the fixed 
time-slice position i. The optimization of n can be achieved by minimizing 
[the t = correlator measured n times at distance t/2 from fixed time-slices] 
x n. This is an easy quantity to compute as function of n; it is sufficient to 
store the individual measurements separately. For a fixed physical separation 
t, the optimal number of measurements n is only weakly dependent on the 
lattice spacing. A possibility that we have not explored is to let the number 
of measurements depend on the boundary conditions, with a termination 
condition determined by the desired accuracy (which would presumably be 
chosen to be proportional to \j\J A^ c ) . 

The efficiency of the 2-level algorithm was compared to that of the 1- 
level algorithm in 2+1 and 3+1 dimensions for various gauge groups. We 
found that the 2-level algorithm performs better for all glueball states but 
the lightest. The kind of gain in computing-time one can expect in realistic 
glueball spectrum calculations varies between 1.5 and 7 for the lightest states 
in the lattice irreducible representations in the case of 2+1D SU(2). The gain 
then increases exponentially with the mass of the state. If high accuracy 
is required for the lightest glueball, it might make sense to do a separate 
run using the 1-level algorithm: at any rate, it will use far less computing 
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time than is required for the heavy states. The same qualitative statements 
apply in computations of flux-tube masses. The 2-level algorithm starts to 
become more favourable at a string length of ~ 2.5fm; and it is always more 
performant for the excited states and the strings of higher representations. 

We would like to conclude by mentioning two further applications of the 
2-level algorithm. As was suggested in [TT] , the method should be well suited 
to compute 3-point functions of glueballs |T5] and flux-tubes [HI] , since these 
observables involve 3 factors, each subject to UV fluctuations. The possibility 
is being investigated. 

An alternative to variational calculations in conjunction with a large num- 
ber of fuzzy operators is the spectral function method (for a review, see |17j ) 
in conjunction with the maximal entropy method to perform the inverse 
Laplace transform. It would be interesting to investigate the possibility of 
using UV operators (e.g. a bare plaquette, which couples equally to many 
states) to extract the glueball spectrum. The correlator would need to be 
measured very accurately - and here we expect the 2-level algorithm to be 
of great help - on a lattice with a very fine temporal resolution. We leave 
this line of research open for the future. 

Acknowledgements In the course of this work I benefited a lot from inter- 
acting with Biagio Lucini, Michael Teper and Urs Wenger at Oxford Univer- 
sity. I also thank M. Hasenbusch, S. Kratochvila, P. Majumdar and P. Weisz 
for interesting discussions at LATTICE 03. The numerical calculations were 
performed on PPARC and EPSRC funded workstations and on a Beowulf 
cluster in Oxford Theoretical Physics. Finally I wish to thank the Berrow 
Trust for financial support. 



References 

[1] M. J. Teper, Phys. Rev. D 59 (1999) 014512 |arXiv:hep-lat /9804008| . 

[2] C. J. Morningstar and M. J. Peardon, Phys. Rev. D 60 (1999) 034509 
|arXiv:hep-lat/9901004| . 

[3] M. Luscher and P. Weisz, JHEP 0207 (2002) 049 
larXiv:hep-lat/0207003| . 



[4] K. J. Juge, J. Kuti and C. Morningstar, arXiv:hep-lat/0312019 



[5] L. Del Debbio, H. Panagopoulos, P. Rossi and E. Vicari, JHEP 0201 
(2002) 009 |arXiv:hep-th/0111090| . 



14 



[6] B. Lucini and M. Teper, Phys. Rev. D 64 (2001) 105019 
|arXiv:hep-lat/01070"07| . 

[7] M. Albanese et al. [APE Collaboration], Phys. Lett. B 192 (1987) 163. 

[8] M. Teper, Phys. Lett. B 183 (1987) 345. 

[9] G. Parisi, R. Petronzio and F. Rapuano, Phys. Lett. B 128 (1983) 418. 

[10] M. Luscher and P. Weisz, JHEP 0109 (2001) 010 
larXivihep-lat/OlOSOlU . 

[11] H. B. Meyer, JHEP 0301 (2003) 048 l arXiv:hep-lat/0209145| . 

[12] H. B. Meyer and M. J. Teper, Nucl. Phys. B 658 (2003) 113 
[arXiv: hep-lat /02 1 2026] . 

[13] H. B. Meyer and M. J. Teper, Nucl. Phys. B 668 (2003) 111 
|arXiv:hep-lat /03060T9| . 

[14] P. Majumdar, Y. Koma and M. Koma, |arXiv:hep-lat/ 0309038, 

[15] G. A. Tickle and C. Michael, Nucl. Phys. B 333 (1990) 593. 

[16] P. Pennanen, A. M. Gre en and C. Michael, Phys. Rev. D 56 (1997) 3903 
|arXiv:hep lat/9705033 . 

[17] M. Asakawa, T. Hatsuda and Y. Nakahara, Prog. Part. Nucl. Phys. 46 
(2001) 459 [arXiv:hep-lat/0011040| . 

[18] M. Teper, private communication 

[19] H.B. Meyer, D.Phil, thesis, in preparation 

[20] N. Cabibbo, E. Marinari, Phys. Lett. B119(1982) 387 

[21] K. Fabricius, O. Haan, Phys. Lett B143 (1984) 459; 

A.D. Kennedy, B.J. Pendleton, Phys. Lett., 156B (1985) 393 

[22] S.L. Adler, Phys. Rev. D 23 (1981) 2901 

[23] B. Berg and A. Billoire, Nucl. Phys. B221 (1983) 109. 
M. Luscher, U. Wolff, Nucl. Phys. B339 (1990) 222. 

[24] G. S. Bali, K. Schilling, A. Hulsebos, A. C. Irving, C. Michael and 
P. W. Stephen son [UKQCD Collaboration], Phys. Lett. B 309 (1993) 
378 |arXiv:hep-lat/9304012| . 

15 



[25] H.B. Meyer, CP. Korthals Altes, in preparation 

[26] P. Weisz, private communication (Tsukuba, Jul. 2003) 

[27] M. Hasenbusch, private communication (Tsukuba, Jul. 2003) 

[28] M. Laine, H.B. Meyer, K. Rummukainen, M. Shaposhnikov, in prepara- 
tion 



Comment on the Liischer- Weisz algorithm 

Consider the LW algorithm for bare time-like Polyakov loop correlators. It 
was noted by several groups (|26|. |27j . [28] (where the implementation is due 
to K. Rummukainen)) that the LW algorithm can probably still be improved 
(and also simplified) for large enough separation R. 

Indeed the physical system consisting of one "time block" with fixed spa- 
tial links as BC can be studied for itself. It is reminiscent of the finite- 
temperature gauge system, where the BC are however periodic. The two 
systems have the same exact Zn symmetry which can be broken sponta- 
neously if the width of the block is small (corresponding to high tempera- 
ture). It must thus be ensured that this does not happen, otherwise the error 
reduction will fail to take place: the VEV taken by the segments of Polyakov 
loop will only be averaged out at the outer level, and will therefore suffer 
from 0{\/ \/Nb c ) fluctuations. 

The BCs break Lorentz invariance in the subsystem. However, by anal- 
ogy with Dirichlet BCs, we expect that the dimensionally-reduced theory, 
containing a adjoint Higgs field and the gauge field, exhibits a pseudo-mass 
gap. In this situation, the correlation between the two segments of link prod- 
ucts decays very rapidly at large distance R and it should help to perform 
multiple measurements of the two segments separately by keeping a slice S 
between them fixed. In this way it is not necessary to store the direct product 
of SU(N) matrices. In fact, this version of MLA has been found to perform 
well even in the presence of an adjoint field ( 28 ). 
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IR 


spin 


P = 6.0 


(3 = 9.0 


(3 = 12.0 


continuum: 






as/a = 0.2538(10) 


as/a = 0.1616(6) 


as/5 = 0.1179(5) 








10 7 sweeps 


3 • 10 6 sweeps 


3 • 10 6 sweeps 




A 2 


4 


2.423(34) 


1.5766(96) 


1.183(16) 


10.01(16) 




4 


2.638(85) 


1.804(42) 


1.395(57) 


11.91(45) 




4 


2.89(11) 


1.856(86) 


1.468(77) 


12.22(69) 




4 


3.01(15) 


2.199(44) 


1.698(44) 


14.98(49) 


E 


3 


2.552(70) 


1.663(23) 


1.2587(51) 


10.84(14) 




1 


2.558(43) 


1.813(20) 


1.360(13) 


11.95(18) 




3 


2.718(63) 


1.855(28) 


1.350(23) 


11.78(27) 



Table 1: The lightest states in the A 2 and E lattice irreducible representa- 
tions. The string tension values are taken from PQ. For 2+1D SU(2) on 
16 3 , 24 3 and 32 3 lattices respectively for the three values of (3. 
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(3 = 6.0 
16 3 x 36 


m^ lev (2.5a) 
4.16 • 10 5 sweeps 


m e V cv (2.5a) 
15.04 • 10 5 sweeps 


^2-lcvel 
£l -level 


At + 


0.7106(87) 


0.7248(55) 


0.69 


E ++ 


1.078(16) 


1.0776(63) 


1.80 


T++ 


1.605(90) 


1.612(18) 


6.55 



(3 = 6.2 
24 3 x 32 


2 • 10 5 sweeps 


™ e V cv (3.5a) 
9.28 ■ 10 5 sweeps 


^2-lcvcl 
£l -level 


Af + 


0.531(12) 


0.5273(62) 


0.83 


E ++ 


0.768(22) 


0.7819(64) 


2.46 


T++ 


0.99(15) 


1.250(27) 


6.60 


(3 = 6.2 
24 3 x 32 


m e 1 i lev (2.5a) 
2 • 10 5 sweeps 


m e V ev (2.5a) 
9.28 ■ 10 5 sweeps 


^2-level 
£l -level 


Af + 


0.5269(77) 


0.5369(52) 


0.48 


E ++ 


0.8079(99) 


0.8026(43) 


1.12 


T++ 


1.260(39) 


1.294(11) 


2.31 



(3 = 6.4 


< K Q CD (2.5a) m 
V = 32 4 : 0.322 • 10 5 sw 


m c 2 ff lcv (2.5a) 
V = 32 3 x 48: 1.11 • 10 5 sw 


^2-lcvcl 
£l -level 


At + 

E ++ 
T++ 


0.415(14) 
0.620(17) 
1.06(8) 


0.4000(73) 
0.5894(72) 
0.946(10) 


0.64 
1.08 
12.4 



Table 2: Comparison of local effective masses using the ordinary 1-level and 
the 2-level algorithms in 3+1D SU(3). The ratios of efficiencies £, repre- 
senting the inverse ratio of CPU time required for fixed accuracy, is given 
in the last column. In the last case, the statistics of the 2-level run were 
scaled up by 1.5 in the efficiency computation to take the different volume 
into account. 
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string 
length 


state 


TOeff(1.5a) m e fj(2.5a) 
1-level: 1.2 • 10 5 sweeps 


m™(2.5a) 
16 • 10 5 sweeps 


6-lcvol 
£.1 -level 


L — 16 


k = 1 
k = 2 
k = 3 
fc = 4 


1.0130(82) 1.001(21) 
1.739(37) 1.74(22) 
2.24(10) / 
2.63(25) / 


1.003(16) 
1.729(30) 
2.216(66) 
2.32(12) 


0.13 
4.0 

/ 
/ 


L = 20 


fc= 1 
k = 2 
fc = 3 
fc = 4 


1.292(17) 1.359(66) 
2.27(11) 1.92(77) 
2.63(27) / 
3.01(72) / 


1.288(19) 
2.174(59) 
2.53(23) 
3.04(33) 


0.90 
13 

/ 
/ 



Table 3: The 4 /c-string tensions in 2+1D SU (8) gauge theory, V = 16 x 20 x 
24, (3 = 115.0, measured with two different algorithms (n = 1000, A = 4 for 
the 2-level case). 



state 


P = 10.90 


(3 = 11.10 


(3 = 11.50 




L = 12 


L = 16 


L = 24 




1.7 • 10 5 sweeps 


0.83 • 10 5 sweeps 


3.6 ■ 10 5 sweeps 


ami 
ami 


0.610(19) 
0.728(75) 


0.592(13) 
0.700(25) 


0.4638(71) 
0.680(20) 


am2 
am* 2 


0.823(25) 
1.187(81) 


0.820(17) 
0.991(56) 


0.616(16) 
0.913(35) 


aia 2 
a 2 a 2 
0-2/0-1 


0.0581(18) 
0.0759(23) 
1.306(56) 


0.04109(90) 
0.0553(11) 
1.346(40) 


0.02114(32) 
0.02748(71) 
1.300(39) 



Table 4: The masses of k — 1 and k = 2 strings for 3+1D SU(4) gauge 
theory; the string tensions are computed assuming the Liischer correction 
m = aL — ji, and in the error bar on the ratio o" 2 /o"i statistical independence 
is assumed. 
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Figure 1: Jacknife-bin-size dependence of the statistical error on the A2 
correlator (top) and its local-effective- mass (bottom). The separation of the 
fixed time-slices is A = 4. For 2+1D SU(2), at (3 = 12, V = 32 3 and 
n = 100, N bc = 1400. 



20 




Figure 2: The variance of the correlator (top) and the local effective mass 
(bottom), as function of the number of measurements under fixed boundary 
conditions n, for fixed computing time. The separation of the fixed time- 
slices is A = 4 on the left and A = 8 on the right. The operator is a linear 
combination of fuzzy magnetic Wilson loops lying in the A 2 square lattice 
irreducible representation. For 2+1D SU(2), at (3 = 12, V = 32 3 . 
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2 3 4 5 6 7 8 

t 

Figure 3: A 2 -correlator variance as function of the Euclidean-time separation 
t, for different numbers of measurements under fixed boundary conditions n 
and separation of the fixed time-slices A. The computing time is the same 
for all points. For 2+1D SU(2), at (5 = 12, V = 32 3 . 
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Figure 4: The local-effective-mass of various correlators, and of the variance 
on the latter, as function of the Euclidean-time separation t. The distance 
between fixed time-slices is A = 8 and the number of measurements under 
fixed boudary conditions is n = 1600 for the top plot and n = 200 for the 
bottom plot. For 2+1D SU(2), at (5 = 12, V = 32 3 . 
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(3=6 inverse efficiency l/t, 
Operator A 3 at t = 2 




(3=9 inverse efficiency l/t, 
Operator A 3 at t = 3 
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Figure 5: A 3 inverse efficiency and its predictor u in 2+1D 5(7(2). 
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(3=6 inverse efficiency 
Operator A 2 at t = 2 
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(3=9 inverse efficiency 1/i; 
Operator A 2 at t = 3 
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Figure 6: A 2 inverse efficiency and its predictor uj in 2+1D SU{2). 
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Figure 7: A\ correlator (top) and LEM (bottom) efficiency curves using 
various methods of VEV subtraction. In 2+1D SU(2). 
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(3=6.0 variant # 
Operator Aj and Aj at t = 4 




0.2 0.4 0.6 0.8 1 



Figure 8: Local-effective- mass variance for three different states, as a function 
of the weighting parameter a (cf. section EJ. The A\ and E curves have been 
rescaled as indicated. For 3+1D SU(3) at (3 = 6.0, V = 16 3 x 36. 
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